123 research outputs found

    An automated classification approach to ranking photospheric proxies of magnetic energy build-up

    Full text link
    We study the photospheric magnetic field of ~2000 active regions in solar cycle 23 to search for parameters indicative of energy build-up and subsequent release as a solar flare. We extract three sets of parameters: snapshots in space and time- total flux, magnetic gradients, and neutral lines; evolution in time- flux evolution; structures at multiple size scales- wavelet analysis. This combines pattern recognition and classification techniques via a relevance vector machine to determine whether a region will flare. We consider classification performance using all 38 extracted features and several feature subsets. Classification performance is quantified using both the true positive rate and the true negative rate. Additionally, we compute the true skill score which provides an equal weighting to true positive rate and true negative rate and the Heidke skill score to allow comparison to other flare forecasting work. We obtain a true skill score of ~0.5 for any predictive time window in the range 2-24hr, with a TPR of ~0.8 and a TNR of ~0.7. These values do not appear to depend on the time window, although the Heidke skill score (<0.5) does. Features relating to snapshots of the distribution of magnetic gradients show the best predictive ability over all predictive time windows. Other gradient-related features and the instantaneous power at various wavelet scales also feature in the top five ranked features in predictive power. While the photospheric magnetic field governs the coronal non-potentiality (and likelihood of flaring), photospheric magnetic field alone is not sufficient to determine this uniquely. Furthermore we are only measuring proxies of the magnetic energy build up. We still lack observational details on why energy is released at any particular point in time. We may have discovered the natural limit of the accuracy of flare predictions from these large scale studies

    Learning with Biased Complementary Labels

    Full text link
    In this paper, we study the classification problem in which we have access to easily obtainable surrogate for true labels, namely complementary labels, which specify classes that observations do \textbf{not} belong to. Let YY and Yˉ\bar{Y} be the true and complementary labels, respectively. We first model the annotation of complementary labels via transition probabilities P(Yˉ=iY=j),ij{1,,c}P(\bar{Y}=i|Y=j), i\neq j\in\{1,\cdots,c\}, where cc is the number of classes. Previous methods implicitly assume that P(Yˉ=iY=j),ijP(\bar{Y}=i|Y=j), \forall i\neq j, are identical, which is not true in practice because humans are biased toward their own experience. For example, as shown in Figure 1, if an annotator is more familiar with monkeys than prairie dogs when providing complementary labels for meerkats, she is more likely to employ "monkey" as a complementary label. We therefore reason that the transition probabilities will be different. In this paper, we propose a framework that contributes three main innovations to learning with \textbf{biased} complementary labels: (1) It estimates transition probabilities with no bias. (2) It provides a general method to modify traditional loss functions and extends standard deep neural network classifiers to learn with biased complementary labels. (3) It theoretically ensures that the classifier learned with complementary labels converges to the optimal one learned with true labels. Comprehensive experiments on several benchmark datasets validate the superiority of our method to current state-of-the-art methods.Comment: ECCV 2018 Ora

    Retrospective suspect and non-target screening combined with similarity measures to prioritize MDMA and amphetamine synthesis markers in wastewater

    Get PDF
    3,4-Methylenedioxymethamphetamine (MDMA) and amphetamine are commonly used psychoactive stimulants. Illegal manufacture of these substances, mainly located in the Netherlands and Belgium, generates large amounts of chemical waste which is disposed in the environment or released in sewer systems. Retrospective analysis of high-resolution mass spectrometry (HRMS) data was implemented to detect synthesis markers of MDMA and amphetamine production in wastewater samples. Specifically, suspect and non-target screening, combined with a prioritization approach based on similarity measures between detected features and mass loads of MDMA and amphetamine was implemented. Two hundred and thirty-five 24 h-composite wastewater samples collected from a treatment plant in the Netherlands between 2016 and 2018 were analyzed by liquid chromatography coupled to high-resolution mass spectrometry. Samples were initially separated into two groups (i.e., baseline consumption versus dumping) based on daily loads of MDMA and amphetamine. Significance testing and fold-changes were used to find differences between features in the two groups. Then, associations between peak areas of all features and MDMA or amphetamine loads were investigated across the whole time series using various measures (Euclidian distance, Pearson's correlation coefficient, Spearman's rank correlation coefficient, distance correlation and maximum information coefficient). This unsupervised and unbiased approach was used for prioritization of features and allowed the selection of 28 presumed markers of production of MDMA and amphetamine. These markers could potentially be used to detect dumps in sewer systems, help in determining the synthesis route and track down the waste in the environment

    Structured Random Matrices

    Full text link
    Random matrix theory is a well-developed area of probability theory that has numerous connections with other areas of mathematics and its applications. Much of the literature in this area is concerned with matrices that possess many exact or approximate symmetries, such as matrices with i.i.d. entries, for which precise analytic results and limit theorems are available. Much less well understood are matrices that are endowed with an arbitrary structure, such as sparse Wigner matrices or matrices whose entries possess a given variance pattern. The challenge in investigating such structured random matrices is to understand how the given structure of the matrix is reflected in its spectral properties. This chapter reviews a number of recent results, methods, and open problems in this direction, with a particular emphasis on sharp spectral norm inequalities for Gaussian random matrices.Comment: 46 pages; to appear in IMA Volume "Discrete Structures: Analysis and Applications" (Springer

    PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers

    Get PDF
    The aim of this paper is to generalize the PAC-Bayesian theorems proved by Catoni in the classification setting to more general problems of statistical inference. We show how to control the deviations of the risk of randomized estimators. A particular attention is paid to randomized estimators drawn in a small neighborhood of classical estimators, whose study leads to control the risk of the latter. These results allow to bound the risk of very general estimation procedures, as well as to perform model selection

    Maximum-Reward Motion in a Stochastic Environment: The Nonequilibrium Statistical Mechanics Perspective

    Get PDF
    We consider the problem of computing the maximum-reward motion in a reward field in an online setting. We assume that the robot has a limited perception range, and it discovers the reward field on the fly. We analyze the performance of a simple, practical lattice-based algorithm with respect to the perception range. Our main result is that, with very little perception range, the robot can collect as much reward as if it could see the whole reward field, under certain assumptions. Along the way, we establish novel connections between this class of problems and certain fundamental problems of nonequilibrium statistical mechanics . We demonstrate our results in simulation examples

    A population Monte Carlo scheme with transformed weights and its application to stochastic kinetic models

    Get PDF
    This paper addresses the problem of Monte Carlo approximation of posterior probability distributions. In particular, we have considered a recently proposed technique known as population Monte Carlo (PMC), which is based on an iterative importance sampling approach. An important drawback of this methodology is the degeneracy of the importance weights when the dimension of either the observations or the variables of interest is high. To alleviate this difficulty, we propose a novel method that performs a nonlinear transformation on the importance weights. This operation reduces the weight variation, hence it avoids their degeneracy and increases the efficiency of the importance sampling scheme, specially when drawing from a proposal functions which are poorly adapted to the true posterior. For the sake of illustration, we have applied the proposed algorithm to the estimation of the parameters of a Gaussian mixture model. This is a very simple problem that enables us to clearly show and discuss the main features of the proposed technique. As a practical application, we have also considered the popular (and challenging) problem of estimating the rate parameters of stochastic kinetic models (SKM). SKMs are highly multivariate systems that model molecular interactions in biological and chemical problems. We introduce a particularization of the proposed algorithm to SKMs and present numerical results.Comment: 35 pages, 8 figure

    Utility of multispectral imaging for nuclear classification of routine clinical histopathology imagery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We present an analysis of the utility of multispectral versus standard RGB imagery for routine H&E stained histopathology images, in particular for pixel-level classification of nuclei. Our multispectral imagery has 29 spectral bands, spaced 10 nm within the visual range of 420–700 nm. It has been hypothesized that the additional spectral bands contain further information useful for classification as compared to the 3 standard bands of RGB imagery. We present analyses of our data designed to test this hypothesis.</p> <p>Results</p> <p>For classification using all available image bands, we find the best performance (equal tradeoff between detection rate and false alarm rate) is obtained from either the multispectral or our "ccd" RGB imagery, with an overall increase in performance of 0.79% compared to the next best performing image type. For classification using single image bands, the single best multispectral band (in the red portion of the spectrum) gave a performance increase of 0.57%, compared to performance of the single best RGB band (red). Additionally, red bands had the highest coefficients/preference in our classifiers. Principal components analysis of the multispectral imagery indicates only two significant image bands, which is not surprising given the presence of two stains.</p> <p>Conclusion</p> <p>Our results indicate that multispectral imagery for routine H&E stained histopathology provides minimal additional spectral information for a pixel-level nuclear classification task than would standard RGB imagery.</p
    corecore